EDA
Picture from codecademy
Please refresh the page if table or figure is not displayed.
Please use Code option on the top right corner to see the code or hide the code.
Please use the dark/light theme option on the top right corner to change the theme.
1 Crypto Market Overview
From 2023 to 2024, the cryptocurrency market in the US experienced significant fluctuations. Bitcoin, Ethereum, and other major coins witnessed both surges and corrections in their prices. This period saw increased regulatory scrutiny, impacting market sentiment and prices. Additionally, the emergence of new projects and technologies contributed to market dynamics, with some coins experiencing rapid growth while others faced challenges or faded into obscurity. A market overview is crucial to understanding trends, assessing risk, and making informed investment decisions. It provides insights into market sentiment, regulatory developments, technological advancements, and the performance of various coins. With the crypto market being highly volatile and influenced by numerous factors, a comprehensive overview helps investors navigate this complex landscape and identify opportunities amidst the risks.
1.1 Dataset Overview
We utilize Kaggle data encompassing prices and trading volumes of various cryptocurrencies, supplemented by calculations for M20 and M5 metrics. The dataset spans from January 1, 2023, to recent data in 2024. Below, we present a succinct overview along with descriptive statistics of the variables.
2 Importing Necessary Libraries
Code
import os
import requests
import pandas as pd
import numpy as np
import datetime
pd.set_option('display.float_format',lambda x : '%.2f'%x)
import warnings
warnings.filterwarnings('ignore')
import gc
import matplotlib.colors
from plotly.subplots import make_subplots
from plotly.offline import init_notebook_mode
from datetime import datetime, timedelta
from decimal import ROUND_HALF_UP, Decimal
import plotly.figure_factory as ff3 Data Import and Preprocessing
Code
# Define the directory path
directory = '../data/historical_data2'
# Define the list of CSV files to import
csv_files = [
'bitcoin.csv',
'ethereum.csv',
'ethereum-classic.csv',
'binancecoin.csv',
'bitcoin-cash.csv',
'solana.csv',
'cardano.csv',
'ripple.csv',
'dogecoin.csv',
]
# Initialize an empty list to store dataframes
dfs = []
# Loop through the selected CSV files
for filename in csv_files:
file_path = os.path.join(directory, filename)
# Read each CSV file and append to the list
df = pd.read_csv(file_path)
# Convert 'date' column to datetime format
df['date'] = pd.to_datetime(df['date'])
# Convert columns with scientific notation to float
df['total_volume'] = df['total_volume'].astype(float)
df['market_cap'] = df['market_cap'].astype(float)
dfs.append(df)
# Concatenate all dataframes in the list
df = pd.concat(dfs, ignore_index=False)
df.rename(columns={'total_volume': 'volume', 'coin_name': 'exchange'}, inplace=True)
# Convert date column to datetime
# Convert 'date' to datetime and sort
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by=['exchange', 'date'])
# Calculating M20 and M5
df['M20'] = df.groupby('exchange')['price'].transform(lambda x: x.rolling(window=20).mean())
df['M5'] = df.groupby('exchange')['price'].transform(lambda x: x.rolling(window=5).mean())
df = df.dropna()
#drop all date before 2023
df = df[df['date'] >= '2023-01-01']
df2 = df.copy()
df.drop(columns=['market_cap'], inplace=True)
#set index = False
df.reset_index(drop=True, inplace=True)
#df.head()| date | price | volume | exchange | M20 | M5 | |
|---|---|---|---|---|---|---|
| 0 | 2023-01-01 | 246.66 | 17764481.57 | binancecoin | 249.45 | 245.89 |
| 1 | 2023-01-02 | 244.06 | 230405872.61 | binancecoin | 247.81 | 245.41 |
| 2 | 2023-01-03 | 245.43 | 393346445.62 | binancecoin | 246.48 | 245.69 |
| 3 | 2023-01-04 | 246.21 | 256645348.85 | binancecoin | 245.37 | 245.64 |
| 4 | 2023-01-05 | 259.13 | 713673930.47 | binancecoin | 245.47 | 248.30 |
Code
ds=df.describe()
ds=pd.DataFrame(ds)| price | volume | M20 | M5 | |
|---|---|---|---|---|
| count | 4068.00 | 4068.00 | 4068.00 | 4068.00 |
| mean | 3991.01 | 4369191270.33 | 3865.05 | 3963.70 |
| std | 11112.62 | 9275163197.03 | 10679.59 | 11109.04 |
| min | 0.06 | 17764481.57 | 0.06 | 0.06 |
| 25% | 0.51 | 255958245.78 | 0.51 | 0.51 |
| 50% | 25.40 | 626872638.48 | 24.32 | 24.81 |
| 75% | 322.32 | 2953179859.87 | 316.19 | 319.61 |
| max | 73097.77 | 96403762012.66 | 68124.91 | 71522.66 |
The table above summarizes statistics for price, volume, M20, and M5 across 4068 data points. Price, M20, and M5 share similar minimum and quartile values, suggesting possible correlation, with price ranging from 0.06 to 73,097.77 and M20 and M5 closely tracking this range. Volume’s statistics are significantly higher, with a mean in the billions and a maximum reaching nearly a trillion, indicating a different scale of measurement. The high standard deviations for all variables hint at substantial variability within the dataset.
4 Data Visualization - Crypto Price and Market Cap Over Time
Code
ex_data = df.copy()
market_cap = df2.copy()
# Assuming ex_data is your DataFrame with columns 'exchange', 'date', and 'price'
color_map = {
'bitcoin': '#FF9900',
'ethereum': '#3C3C3D',
'ethereum-classic': '#669073',
'binancecoin': '#F3BA2F',
'bitcoin-cash': '#4CC947',
'solana': '#00FFCD',
'cardano': '#3D9',
'ripple': '#006097',
'dogecoin': '#BA9F33',
}
import matplotlib.pyplot as plt
# Ensure the data is sorted by date for each cryptocurrency
ex_data.sort_values(by=['exchange', 'date'], inplace=True)
market_cap.sort_values(by=['exchange', 'date'], inplace=True)
# Create a figure and axes for the subplots
fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(20, 10), sharex=True)
fig.suptitle('Crypto Price and Market Cap Over Time')
# Plot 1: Crypto Prices
for exchange, group in ex_data.groupby('exchange'):
axs[0].plot(group['date'], group['price'], label=exchange, color=color_map.get(exchange, 'black'))
axs[0].set_title('Crypto Price Over Time')
axs[0].set_yscale('log')
axs[0].set_ylabel('Close Price (USD)')
for label in axs[0].get_xticklabels():
label.set_rotation(45)
axs[0].legend()
#set plot size
fig.set_size_inches(11, 8, forward=True)
#set x title rotate
plt.xticks(rotation=45)
# Plot 2: Market Cap
for exchange, group in market_cap.groupby('exchange'):
axs[1].plot(group['date'], group['market_cap'], label=exchange, color=color_map.get(exchange, 'black'))
axs[1].set_title('Market Cap Over Time')
axs[1].set_yscale('log')
axs[1].set_ylabel('Market Cap (USD)')
for label in axs[1].get_xticklabels():
label.set_rotation(45)
axs[1].legend()
# Set the y-axis ticks and labels for the price plot
price_ticks = [0.1, 1, 10, 100, 1000, 10000, 100000]
price_tick_labels = ['0.1', '1', '10', '100', '1K', '10K', '100K']
axs[0].set_yticks(price_ticks)
axs[0].set_yticklabels(price_tick_labels)
# Since matplotlib does not handle 'B' and 'T' natively, we'll set these as labels.
market_cap_ticks = [2e9, 5e9, 10e9, 20e9, 50e9, 100e9, 200e9, 500e9, 1e12] # Convert B and T to numeric values
market_cap_tick_labels = ['2B', '5B', '10B', '20B', '50B', '100B', '200B', '500B', '1T']
axs[1].set_yticks(market_cap_ticks)
axs[1].set_yticklabels(market_cap_tick_labels)
# Automatically adjust subplot params so that the subplot(s) fits in to the figure area
plt.tight_layout(rect=[0, 0, 1, 0.96])
# Show the plot
plt.show()The charts depict the price and market capitalization of various cryptocurrencies over time, plotted on logarithmic scales to accommodate wide-ranging values. Bitcoin has the highest price and market cap, with Ethereum following. Most cryptocurrencies show an increase in both price and market cap over the observed period. Market caps for smaller cryptocurrencies like Dogecoin, Ripple, and Cardano are lower but increase steadily. Ethereum Classic, Solana, and Binance Coin also show growth. The right charts indicate that while prices fluctuate, market caps generally trend upwards, suggesting an increasing investment in these assets over time. The growth trajectory is consistent across both short-term and long-term views.
5 Data Visualization - Correlation Heatmap of Cryptocurrencies
Code
import plotly.graph_objects as go
fig = go.Figure()
# Update layout for a pure white background and no axes
fig.update_layout(
plot_bgcolor='white', # Set plot background to white
paper_bgcolor='white', # Set figure background to white
xaxis=dict(showgrid=False, zeroline=False, visible=False), # Hide x-axis
yaxis=dict(showgrid=False, zeroline=False, visible=False), # Hide y-axis
margin=dict(l=0, r=0, t=0, b=0) # Remove margins
)
# Set figure size to be very small
fig.update_layout(width=100, height=10)
# Show the plot
fig.show()Code
import plotly.graph_objects as go
df_pivot=df.pivot_table(index='date', columns='exchange', values='price').reset_index()
corr=df_pivot.corr().round(2)
mask=np.triu(np.ones_like(corr, dtype=bool))
c_mask = np.where(~mask, corr, 100)
c=[]
for i in c_mask.tolist()[1:]:
c.append([x for x in i if x != 100])
cor=c[::-1]
x=corr.index.tolist()[:-1]
y=corr.columns.tolist()[1:][::-1]
fig=ff.create_annotated_heatmap(z=cor, x=x, y=y,
hovertemplate='Correlation between %{x} and %{y} Coin = %{z}',
colorscale='viridis', name='')
fig.update_layout(template="plotly_white", title='Coin Correlation',height=800,width=900,
yaxis=dict(showgrid=False, autorange='reversed'),
xaxis=dict(showgrid=False))
#set plot size
fig.show()The chart presents the correlation coefficients between various cryptocurrencies. Bitcoin shows a high correlation with most coins, particularly Ethereum at 0.97, suggesting their market movements are closely related. Ethereum Classic also shows strong correlations with other coins, especially Ethereum at 0.86. Dogecoin has a noticeable correlation with Bitcoin at 0.78 and even higher with Binance Coin at 0.86. Ripple stands out for its low correlation with other cryptocurrencies, notably only 0.04 with Binance Coin, indicating its price movements are mostly independent from the others. Generally, the chart reflects interdependencies between these digital assets, with Bitcoin being a central reference point.
7 Data Visualization - Yearly Average Returns
Code
final_v1=df.copy()
final_v1["Return"] = final_v1.groupby("exchange")["price"].pct_change(1)
final_v1['date'] = pd.to_datetime(final_v1['date'],errors='coerce')
final_v1['Year'] = final_v1['date'].dt.year
years = {year: pd.DataFrame() for year in final_v1.Year.unique()[::-1]}
for key in years.keys():
df=final_v1[final_v1.Year == key]
years[key] = df.groupby('exchange')['Return'].mean().mul(100).rename("Avg_return_{}".format(key))
df=pd.concat((years[i].to_frame() for i in years.keys()), axis=1)
df=df.sort_values(by="Avg_return_2023")
fig = make_subplots(rows=1, cols=3, shared_yaxes=True) # Adjust based on your actual subplot needs
for i, col in enumerate(df.columns):
x = df[col]
mask = x <= 0
# Adding traces for negative and positive returns
fig.add_trace(go.Bar(x=x[mask], y=df.index[mask], orientation='h',
text=x[mask], texttemplate='%{text:.2f}%', textposition='auto',
hovertemplate='Average Return in %{y} exchange = %{x:.4f}%',
marker=dict(color='red', opacity=0.7), name=col[-4:]),
row=1, col=i+1)
fig.add_trace(go.Bar(x=x[~mask], y=df.index[~mask], orientation='h',
text=x[~mask], texttemplate='%{text:.2f}%', textposition='auto',
hovertemplate='Average Return in %{y} exchange = %{x:.4f}%',
marker=dict(color='green', opacity=0.7), name=col[-4:]),
row=1, col=i+1)
# Update axes to include grid
fig.update_xaxes(title_text=f"{col[-4:]} Returns", title_standoff=25,
automargin=True, showgrid=True, gridcolor='LightGrey', row=1, col=i+1)
fig.update_yaxes(showgrid=True, gridcolor='LightGrey', row=1, col=i+1)
fig.update_layout(template="plotly_white", title='Yearly Average Returns',
hovermode='closest', height=600, width=900, showlegend=False)
fig.show()The bar charts show the yearly average returns for various cryptocurrencies in 2023 and 2024. In 2024, Dogecoin had the highest average return at 0.99%, followed by Bitcoin Cash at 0.89% and Solana at 0.84%. The lowest return was seen with Ripple at 0.09%. In 2023, Solana led the returns at 0.77%, with Cardano and Ripple yielding the lowest returns at 0.31% and 0.26%, respectively. Bitcoin showed more consistent performance with a 0.63% return in 2024 and a 0.28% return in 2023. Overall, the returns fluctuated year over year, with Dogecoin showing the most significant increase in returns from 2023 to 2024.
8 Conclusion
The cryptocurrency market in the US experienced significant fluctuations from 2023 to 2024, with Bitcoin, Ethereum, and other major coins witnessing both surges and corrections in their prices. The dataset analyzed in this overview provides insights into the market dynamics, including price movements, trading volumes, and correlations between various cryptocurrencies. The visualizations presented in this report offer a comprehensive view of the market, highlighting trends, patterns, and opportunities for investors. By examining the data and visualizations, investors can gain a better understanding of the market landscape, assess risks, and make informed investment decisions. The analysis underscores the importance of monitoring market trends, regulatory developments, and technological advancements to navigate the volatile cryptocurrency market successfully.